302 PART 6 Analyzing Survival Data

In this chapter, we explain how survival data aren’t like ordinary numerical data

and why you need to use specific techniques to analyze them properly. We describe

two ways to construct survival curves: the life-table and the Kaplan-Meier meth-

ods. We guide you in preparing and interpreting survival curves and show you how

to glean useful information from these curves, such as median survival time and

five-year survival rates.

Understanding the Basics of Survival Data

To understand survival analysis, you first have to understand survival data. Sur-

vival times are intervals between a designated starting time point and the time

point an event occurs. These intervals have can have a specific type of missing

data due to a phenomenon called censoring. Because survival data usually include

censored data, they must be analyzed in a very specific way to avoid generating

biased estimates that lead to incorrect conclusions.

Examining how survival times are intervals

The techniques described in this chapter for summarizing, graphing, and com-

paring survival data deal with the time interval from a defined starting point to

the first occurrence of an endpoint event. The event can be designated as death or

a relapse of a particular condition, such as a recurrence of cancer. Or you could

designate the event to be surgical removal (called an explant) of a failed mechani-

cal component, such as an artificial heart valve. If a patient’s heart valve was

implanted on January 10 (beginning of time interval), but their body rejected it

and the explant took place on January 30 (time of event), then the time interval

from implant to explant is 30 – 10, or 20 days.

A person can die only once, so survival analysis can obviously be used for one-

time events. But other endpoints can occur multiple times, such as having a stroke

or having cancer go into remission. The techniques we describe in this chapter

only analyze time to the first occurrence of the event. More advanced survival

analysis methods are needed for models that can handle multiple occurrences of

an event, and these are beyond the scope of this book.

The starting point of the time interval is somewhat arbitrary, so it must be defined

explicitly every time you do a survival analysis. Imagine that you’re studying the

progression of chronic obstructive pulmonary disease (COPD) in a group of

patients. If you want to study the natural history of the disease, the starting point

can be the diagnosis date. But if you’re instead interested in evaluating the

efficacy of a treatment, the starting point can be defined as the date the

treatment began.